BioBlend with IPython

BioBlend is a python libary for Galaxy toolkits. It enables high level developments for deploying and deleting cloud instances for Galaxy in python. Furthermore, it supports load and execute workflows on python in a simple format.
BioBlend: https://github.com/afgane/bioblend

An example for using BioBlend:

from bioblend.galaxy import GalaxyInstance
gi = GalaxyInstance('<Galaxy IP>', key='your API key')
libs = gi.libraries.get_libraries()
gi.workflows.show_workflow('workflow ID')
gi.workflows.run_workflow('workflow ID', input_dataset_map)

Run a galaxy workflow through IPyhon Notebook

This tutorial shows how to use the BioBlend library with workflow examples so that python developers can use Galaxy tools on python and Ipython.

Create Galaxy Instance using BioBlend

This step connects to a galaxy server using BioBlend. A server url and an api key are required to connect.
The server url is a location for the installed galaxy server and the api key is an identification of a user.

We have installed a galaxy on a local machine with ipython. So, the galaxy url should be on a local ip address and a default port number 8080, e.g. http://127.0.0.1:8080.
If you want to use a different galaxy server e.g. the public galaxy server hosted by Penn State University, use https://main.g2.bx.psu.edu/.
For the galaxy_api_key, a galaxy user needs to get the string from here: http://[galaxy_server]/user/api_keys?cntrller=user
It is like a password, so only logged in users can obtain the key. Here, I used the key d8699f27a08cc6f42a065e39955b6c47 for my account on the local galaxy server.


In [66]:
from bioblend.galaxy import GalaxyInstance

galaxy_url = "http://127.0.0.1:8080"
galaxy_api_key = "d8699f27a08cc6f42a065e39955b6c47"
gi = GalaxyInstance(url=galaxy_url, key=galaxy_api_key)

Test a connection with get_histories()

Once the connection is successfully established, obtaining galaxy histories is a good example to test it is working.
get_histories() returns a list of a current history for a logged in user.


In [69]:
hl = gi.histories.get_histories()
hl


Out[69]:
[{u'deleted': False,
  u'id': u'df7a1f0c02a5b08e',
  u'model_class': u'History',
  u'name': u'Unnamed history',
  u'published': True,
  u'tags': [],
  u'url': u'/api/histories/df7a1f0c02a5b08e'},
 {u'deleted': False,
  u'id': u'5969b1f7201f12ae',
  u'model_class': u'History',
  u'name': u'Unnamed history',
  u'published': False,
  u'tags': [],
  u'url': u'/api/histories/5969b1f7201f12ae'},
 {u'deleted': False,
  u'id': u'a799d38679e985db',
  u'model_class': u'History',
  u'name': u'Unnamed history',
  u'published': False,
  u'tags': [],
  u'url': u'/api/histories/a799d38679e985db'},
 {u'deleted': False,
  u'id': u'33b43b4e7093c91f',
  u'model_class': u'History',
  u'name': u'Unnamed history',
  u'published': False,
  u'tags': [],
  u'url': u'/api/histories/33b43b4e7093c91f'},
 {u'deleted': False,
  u'id': u'ebfb8f50c6abde6d',
  u'model_class': u'History',
  u'name': u'Unnamed history',
  u'published': False,
  u'tags': [],
  u'url': u'/api/histories/ebfb8f50c6abde6d'}]

Note.

Galaxy has analytical tools based on Python. Each one of them has an id. For example, CONVERTER_interval_to_bedstrict_0
The JSON Workflow file has section name "tool_id" for the id. e.g. https://gist.github.com/lee212/f1449352334a2268b849

List saved workflows

BioBlend supports basic functions to load and run workflows. get_workflows() returns a list of workflows that a galaxy user has.


In [14]:
workflows = gi.workflows.get_workflows()
workflows


Out[14]:
[{u'id': u'f597429621d6eb2b',
  u'model_class': u'StoredWorkflow',
  u'name': u"Workflow constructed from history 'Unnamed history'",
  u'published': True,
  u'tags': [],
  u'url': u'/api/workflows/f597429621d6eb2b'},
 {u'id': u'1cd8e2f6b131e891',
  u'model_class': u'StoredWorkflow',
  u'name': u'Galaxy 101 (imported from uploaded file)',
  u'published': False,
  u'tags': [],
  u'url': u'/api/workflows/1cd8e2f6b131e891'}]

Retrieve workflow information

There are two workflows stored in the database. Let's select the second workflow named 'Galaxy 101' and see what components it has.
show_workflow() returns detailed information about a workflow such as an id and inputs.


In [72]:
workflow = workflows[1]
res = gi.workflows.show_workflow(workflow['id'])
res


Out[72]:
{u'id': u'1cd8e2f6b131e891',
 u'inputs': {u'29': {u'label': u'Features', u'value': u''},
  u'30': {u'label': u'Exons', u'value': u''}},
 u'model_class': u'StoredWorkflow',
 u'name': u'Galaxy 101 (imported from uploaded file)',
 u'published': False,
 u'steps': {u'24': {u'id': 24,
   u'input_steps': {u'input1': {u'source_step': 30, u'step_output': u'output'},
    u'input2': {u'source_step': 29, u'step_output': u'output'}},
   u'tool_id': u'gops_join_1',
   u'type': u'tool'},
  u'25': {u'id': 25,
   u'input_steps': {u'input': {u'source_step': 26,
     u'step_output': u'out_file1'}},
   u'tool_id': u'sort1',
   u'type': u'tool'},
  u'26': {u'id': 26,
   u'input_steps': {u'input1': {u'source_step': 24,
     u'step_output': u'output'}},
   u'tool_id': u'Grouping1',
   u'type': u'tool'},
  u'27': {u'id': 27,
   u'input_steps': {u'input1': {u'source_step': 30, u'step_output': u'output'},
    u'input2': {u'source_step': 28, u'step_output': u'out_file1'}},
   u'tool_id': u'comp1',
   u'type': u'tool'},
  u'28': {u'id': 28,
   u'input_steps': {u'input': {u'source_step': 25,
     u'step_output': u'out_file1'}},
   u'tool_id': u'Show beginning1',
   u'type': u'tool'},
  u'29': {u'id': 29,
   u'input_steps': {},
   u'tool_id': None,
   u'type': u'data_input'},
  u'30': {u'id': 30,
   u'input_steps': {},
   u'tool_id': None,
   u'type': u'data_input'}},
 u'tags': [],
 u'url': u'/api/workflows/1cd8e2f6b131e891'}

Run workflow

run_workflow() executes the workflow with an input dataset into a selected history.
It returns output dataset IDs which indicate the results of each step in the workflow.


In [49]:
dataset_map = {'30':{'id':'cbbbf59e8f08c98c','src':'hda'}, \
                '29': {'id': '964b37715ec9bd22', 'src': 'hda' }}
outputs = gi.workflows.run_workflow(workflow['id'], dataset_map, history_id='df7a1f0c02a5b08e')#history_name='test1withhda')


Out[49]:
{u'history': u'df7a1f0c02a5b08e',
 u'outputs': [u'6fc9fbb81c497f69',
  u'6fb17d0cc6e8fae5',
  u'5114a2a207b7caff',
  u'06ec17aefa2d49dd',
  u'b8a0d6158b9961df']}

There are two input datasets used and one of them is 'UCSC Main on Human: knownGene (chr22:1-51304566)'.
Its id 'cbbbf59e8f08c98c' displays detailed information for the input dataset.


In [56]:
dataset = gi.datasets.show_dataset('cbbbf59e8f08c98c')
dataset


Out[56]:
{u'accessible': True,
 u'api_type': u'file',
 u'data_type': u'bed',
 u'deleted': False,
 u'display_apps': [{u'label': u'display in IGB',
   u'links': [{u'href': u'/display_application/cbbbf59e8f08c98c/igb_bed/Local',
     u'target': u'_blank',
     u'text': u'Local'},
    {u'href': u'/display_application/cbbbf59e8f08c98c/igb_bed/Web',
     u'target': u'_blank',
     u'text': u'Web'}]},
  {u'label': u'display at Ensembl',
   u'links': [{u'href': u'/display_application/cbbbf59e8f08c98c/ensembl_interval/ensembl_Current',
     u'target': u'_blank',
     u'text': u'Current'}]},
  {u'label': u'display at RViewer',
   u'links': [{u'href': u'/display_application/cbbbf59e8f08c98c/rviewer_interval/lbl_main',
     u'target': u'_blank',
     u'text': u'main'}]}],
 u'display_types': [],
 u'download_url': u'/api/histories/df7a1f0c02a5b08e/contents/cbbbf59e8f08c98c/display',
 u'file_ext': u'bed',
 u'file_size': 797714,
 u'genome_build': u'hg19',
 u'hda_ldda': u'hda',
 u'hid': 1,
 u'history_id': u'df7a1f0c02a5b08e',
 u'id': u'cbbbf59e8f08c98c',
 u'metadata_chromCol': 1,
 u'metadata_column_names': None,
 u'metadata_column_types': [u'str', u'int', u'int', u'str', u'int', u'str'],
 u'metadata_columns': 6,
 u'metadata_comment_lines': None,
 u'metadata_data_lines': 12410,
 u'metadata_dbkey': u'hg19',
 u'metadata_endCol': 3,
 u'metadata_nameCol': 4,
 u'metadata_startCol': 2,
 u'metadata_strandCol': 6,
 u'metadata_viz_filter_cols': [4],
 u'misc_blurb': u'12,410 regions',
 u'misc_info': u'',
 u'model_class': u'HistoryDatasetAssociation',
 u'name': u'UCSC Main on Human: knownGene (chr22:1-51304566)',
 u'peek': u'<table cellspacing="0" cellpadding="3"><tr><th>1.Chrom</th><th>2.Start</th><th>3.End</th><th>4.Name</th><th>5</th><th>6.Strand</th></tr><tr><td>chr22</td><td>16258185</td><td>16258303</td><td>uc002zlh.1_cds_1_0_chr22_16258186_r</td><td>0</td><td>-</td></tr><tr><td>chr22</td><td>16266928</td><td>16267095</td><td>uc002zlh.1_cds_2_0_chr22_16266929_r</td><td>0</td><td>-</td></tr><tr><td>chr22</td><td>16268136</td><td>16268181</td><td>uc002zlh.1_cds_3_0_chr22_16268137_r</td><td>0</td><td>-</td></tr><tr><td>chr22</td><td>16269872</td><td>16269943</td><td>uc002zlh.1_cds_4_0_chr22_16269873_r</td><td>0</td><td>-</td></tr><tr><td>chr22</td><td>16275206</td><td>16275277</td><td>uc002zlh.1_cds_5_0_chr22_16275207_r</td><td>0</td><td>-</td></tr><tr><td>chr22</td><td>16277747</td><td>16277885</td><td>uc002zlh.1_cds_6_0_chr22_16277748_r</td><td>0</td><td>-</td></tr></table>',
 u'purged': False,
 u'state': u'ok',
 u'uuid': None,
 u'visible': True,
 u'visualizations': [u'trackster', u'circster', u'scatterplot']}

Display outputs in IPython HTML


In [89]:
from IPython.core.display import HTML

In [116]:
merged_htmls = ""

for output in outputs['outputs']:
    dataset = gi.datasets.show_dataset(output)
    #pprint.pprint(dataset)
    name = dataset['name']
    html = dataset['peek']
    merged_htmls += "<p><b>%s</b>" % name + html + "</p>"
    
HTML(merged_htmls)


Out[116]:

Join on data 2 and data 1

1.Chrom2.Start3.End4.Name56.Strand789101112
chr221625818516258303uc002zlh.1_cds_1_0_chr22_16258186_r0-chr221625827816258279rs28451780+
chr221626692816267095uc002zlh.1_cds_2_0_chr22_16266929_r0-chr221626703116267032rs72922000+
chr221626692816267095uc002zlh.1_cds_2_0_chr22_16266929_r0-chr221626696316266964rs101546800+
chr221626692816267095uc002zlh.1_cds_2_0_chr22_16266929_r0-chr221626701116267012rs72902620+
chr221626692816267095uc002zlh.1_cds_2_0_chr22_16266929_r0-chr221626703716267038rs28185720+
chr221626987216269943uc002zlh.1_cds_4_0_chr22_16269873_r0-chr221626993316269934rs28452060+

Group on data 8

12
uc002zlh.1_cds_1_0_chr22_16258186_r1
uc002zlh.1_cds_2_0_chr22_16266929_r4
uc002zlh.1_cds_4_0_chr22_16269873_r1
uc002zlh.1_cds_5_0_chr22_16275207_r2
uc002zlh.1_cds_6_0_chr22_16277748_r5
uc002zlh.1_cds_7_0_chr22_16279195_r2

Sort on data 9

12
uc010gsw.2_cds_1_0_chr22_21480537_r67
uc021wmb.1_cds_0_0_chr22_21480537_r67
uc002zoc.3_cds_0_0_chr22_18834445_f58
uc021wnd.1_cds_0_0_chr22_24647973_f50
uc021wmc.1_cds_0_0_chr22_21637809_f47
uc003bhh.3_cds_0_0_chr22_46652458_r46

Select first on data 10

12
uc010gsw.2_cds_1_0_chr22_21480537_r67
uc021wmb.1_cds_0_0_chr22_21480537_r67
uc002zoc.3_cds_0_0_chr22_18834445_f58
uc021wnd.1_cds_0_0_chr22_24647973_f50
uc021wmc.1_cds_0_0_chr22_21637809_f47

top 5 exons

1.Chrom2.Start3.End4.Name56.Strand
chr221883444418835833uc002zoc.3_cds_0_0_chr22_18834445_f0+
chr222148053621481925uc010gsw.2_cds_1_0_chr22_21480537_r0-
chr222148053621481925uc021wmb.1_cds_0_0_chr22_21480537_r0-
chr222163780821638558uc021wmc.1_cds_0_0_chr22_21637809_f0+
chr222464797224649256uc021wnd.1_cds_0_0_chr22_24647973_f0+

[comment]: <> (<!---

Plans

  • create pipelines and workflows in python
    • create functions by wrapping python scripts with parameters
      e.g. join(a,b,c) <= join.py a=1 b=2 c=3
    • how to handle outputs?
  • display workflows in ipython?
    • html for javascript and iframe
    • can we do like matplotlib? --> )We successfully executed the workflow on Python with BioBlend and displayed the results on IPython.